#First need to download this Libraries for Visualization or interactive plots
# Load Required Libraries
library(ggplot2)
library(dplyr)
library(tidyr)
#install.packages("reshape2")
library(reshape2)
#install.packages("GGally")
library(GGally)
#install.packages("zoo")
library(zoo)
#install.packages("ggmap")
library(ggmap)
#install.packages("leaflet")
library(leaflet)
#install.packages("plotly")
library(plotly)
#install.packages("mapview")
library(mapview)
# Set working directory
setwd("D:/git/Analysis of Crime Data in Colchester")
# Read the crime data
crime23_data <- read.csv("crime23.csv")
The full analysis of the provided data has been undertaken and a report has been produced considering all the variants. The conclusion has been drawn at the end of this analysis.
In this report, we’re going to explore the Crime23 data set, which contains information about crimes in a specific area during a certain time period. By using data visualization, we may gain a better knowledge of the distribution of crimes across different categories and high-risk areas, as well as look at the link between different crime types and their effects.We’ll use graphs and charts to see if there are any patterns or trends in how and when these crimes happen. By doing this, we hope to learn more about why crimes occur and how we can prevent them from happening in the future.
# Two-way table of Crime Category and Outcome Status
two_way_table <- table(crime23_data$category, crime23_data$outcome_status)
two_way_table
Action to be taken by another organisation
anti-social-behaviour 0
bicycle-theft 0
burglary 0
criminal-damage-arson 5
drugs 2
other-crime 1
other-theft 1
possession-of-weapons 1
public-order 6
robbery 0
shoplifting 5
theft-from-the-person 0
vehicle-crime 0
violent-crime 83
Awaiting court outcome Court result unavailable
anti-social-behaviour 0 0
bicycle-theft 0 1
burglary 1 15
criminal-damage-arson 31 22
drugs 18 17
other-crime 6 3
other-theft 2 6
possession-of-weapons 9 10
public-order 18 17
robbery 4 4
shoplifting 51 45
theft-from-the-person 0 0
vehicle-crime 6 2
violent-crime 114 64
Formal action is not in the public interest
anti-social-behaviour 0
bicycle-theft 0
burglary 0
criminal-damage-arson 0
drugs 1
other-crime 0
other-theft 1
possession-of-weapons 0
public-order 2
robbery 0
shoplifting 1
theft-from-the-person 0
vehicle-crime 0
violent-crime 4
Further action is not in the public interest
anti-social-behaviour 0
bicycle-theft 0
burglary 2
criminal-damage-arson 2
drugs 10
other-crime 6
other-theft 1
possession-of-weapons 1
public-order 12
robbery 0
shoplifting 11
theft-from-the-person 0
vehicle-crime 0
violent-crime 37
Further investigation is not in the public interest
anti-social-behaviour 0
bicycle-theft 0
burglary 0
criminal-damage-arson 0
drugs 0
other-crime 8
other-theft 0
possession-of-weapons 0
public-order 0
robbery 0
shoplifting 0
theft-from-the-person 0
vehicle-crime 0
violent-crime 0
Investigation complete; no suspect identified
anti-social-behaviour 0
bicycle-theft 216
burglary 154
criminal-damage-arson 363
drugs 15
other-crime 13
other-theft 350
possession-of-weapons 9
public-order 193
robbery 38
shoplifting 299
theft-from-the-person 61
vehicle-crime 350
violent-crime 595
Local resolution Offender given a caution
anti-social-behaviour 0 0
bicycle-theft 1 0
burglary 0 0
criminal-damage-arson 14 4
drugs 98 6
other-crime 0 2
other-theft 1 2
possession-of-weapons 11 6
public-order 7 1
robbery 1 0
shoplifting 34 5
theft-from-the-person 0 0
vehicle-crime 1 0
violent-crime 71 35
Status update unavailable
anti-social-behaviour 0
bicycle-theft 8
burglary 13
criminal-damage-arson 6
drugs 9
other-crime 10
other-theft 12
possession-of-weapons 5
public-order 14
robbery 5
shoplifting 4
theft-from-the-person 2
vehicle-crime 5
violent-crime 84
Suspect charged as part of another case
anti-social-behaviour 0
bicycle-theft 0
burglary 0
criminal-damage-arson 0
drugs 0
other-crime 0
other-theft 0
possession-of-weapons 0
public-order 0
robbery 0
shoplifting 1
theft-from-the-person 0
vehicle-crime 0
violent-crime 0
Unable to prosecute suspect Under investigation
anti-social-behaviour 0 0
bicycle-theft 8 1
burglary 20 20
criminal-damage-arson 117 17
drugs 13 19
other-crime 34 9
other-theft 94 21
possession-of-weapons 12 10
public-order 218 44
robbery 35 7
shoplifting 76 22
theft-from-the-person 10 3
vehicle-crime 23 19
violent-crime 1299 247
I see a remarkably high number of 1299 violent crime incidents where we were “unable to prosecute the suspect.” Furthermore, 247 incidents of violent crimes are “still under active investigation.” Alarmingly, there were 595 violent crime incidents where “our investigation was complete but no suspect could be identified at all.” These numbers show that it is extremely difficult to find, capture, and convict violent criminals. Shoplifting seems to be another problematic area, with 51 and 45 cases “awaiting court outcomes, potentially straining judicial resources.” Furthermore, 76 shoplifting cases resulted in “suspects going uncharged due to an inability to prosecute.”However, I note that 34 shoplifting incidents were resolved through local resolution measures.
Criminal damage and arson show 363 cases where no suspect was found despite completed investigations.”Unable to prosecute suspect” was an issue in 117 of these cases as well. A total of 218 instances involving “public order crimes” were resolved locally, while 44 cases remain open and “under investigation currently.” 19 ongoing cases are still under investigation, while 98 drug-related occurrences were settled locally.
It’s interesting to note that several crime categories, such as anti-social behavior and bicycle theft, had zero cases in result categories like “Formal action not in the public interest” or “Further investigation not in the public interest.” More research is needed to determine the causes of this selective filtering.
crime23_data %>%
count(category) %>%
ggplot(aes(x = category, y = n, color = category, group = 1)) +
geom_line() +
geom_point(size = 3) +
geom_text(aes(label = n), vjust = 2, color = "black") +
scale_x_discrete(guide = guide_axis(angle = 45)) +
scale_color_manual(values = rainbow(length(unique(crime23_data$category)))) + # Specify colors
labs(title = "Count of Each Category",
x = "Category",
y = "Count")
The most prevalent crime category is violent crime, with an incident count of 2,633. This alarmingly high number indicates that violent offenses should be the top priority for local authorities to address through enhanced prevention and intervention measures. Following violent crime, the next highest category is anti-social behavior, with 677 incidents. This suggests that issues related to public order disturbances and social problems are also a significant concern that requires focused attention. Vehicle-related crimes, including vehicle-person-theft with 406 incidents and bicycle-theft with 235 incidents, account for a combined total of 641 cases. This highlights the need for targeted strategies to prevent and reduce property crimes against vehicles.
Theft-related offenses, such as shoplifting with 554 incidents, other-theft with 491 incidents, and theft-from-the-person with 76 incidents, collectively make up a sizable portion of the crimes recorded. Addressing these types of theft should be an important focus area. Criminal damage and arson incidents total 581, while burglary accounts for 225 crimes, indicating the prevalence of property-related destruction and break-ins that require dedicated efforts.
Other noteworthy categories include public-order issues with 532 incidents, drug-related crimes with 208 incidents, and robbery with 94 incidents, all of which warrant attention and tailored interventions. This analysis can guide the allocation of resources, the implementation of targeted crime prevention strategies, and the overall enhancement of public safety.
outcome_status_plt <- ggplot(crime23_data, aes(x = outcome_status,fill = outcome_status)) +
geom_bar( width = 0.5) +
geom_text(stat = "count", aes(label = ..count..), vjust = -0.5, color = "black") +
scale_x_discrete(guide = guide_axis(angle = 30)) +
labs(title = "Distribution of outcomes",
x = "Outcome Status",
y = "Count") +
theme_minimal()
# theme(axis.text.x = element_text(angle = 10, vjust = 0.5))
print(outcome_status_plt)
The most common outcome is “Investigation complete: no suspect identified” with 2,656 incidents, indicating a lack of suspects in many cases.”Unable to prosecute suspect” follows with 1,959 incidents, showing challenges in prosecution. 677 incidences of “NA” (Not Available) denote incomplete outcome data. There are 439 incidents listed as “under investigation,” which represents pending cases. Other notable outcomes include “Suspect charged as part of another case” (1,959 incidents), “Status update unavailable” (177 incidents), and “Local resolution” (239 incidents). “Action to be taken by another organization” (104 incidents), “Formal action is not in the public interest” (9 incidents), and “Further action is not in the public interest” (82 incidents) are less common outcomes.
# Create pie chart data
crime23_data_pie <- as.data.frame(table(crime23_data$category))
# Assuming crime_data_pie$Freq contains the frequencies and crime_data_pie$Var1 contains the category labels
# Plotting pie chart with values
pie(crime23_data_pie$Freq, labels = paste(crime23_data_pie$Var1, "(", crime23_data_pie$Freq, ")", sep = ""),
main = "Crime Category Pie Chart",
cex.main = 1.5, cex = 1.2)
According to the pie chart, violent crimes constitute a significant portion, with a value of 2633. Shoplifting (554) and vehicle-related crimes (406) also have comparatively high rates. Other notable crimes include robbery (94) and theft-from-the-person (76). Weapons possession (74) and public order violations (532) are represented, pointing to problems with illegal and public safety. There are also general other crimes (92) and various types of stealing (491), indicating a variety of other illegal actions.
The number of drug-related crimes (208) is shown, indicating how common drug usage is and how many illegal activities are linked with it. Criminal damage-arson (581), which probably refers to intentional fire-setting acts and harm to property-is mentioned together with criminal damage.
Burglary (225) and bicycle theft (235) are separate categories, highlighting the occurrence of break-ins and theft of personal transportation modes. Interestingly, anti-social behavior (677) is very valuable and may include harassment violations, public problems, and disruptive behavior.
This data can be used to guide targeted strategies to increase community safety and address the root causes of criminal behavior.
# Dot plot (requires additional data manipulation)
dot_plt<-ggplot(crime23_data, aes(x = date, y = location_type)) +
geom_point() +
labs(title = "Dot Plot of Crime Incidents by Location Type in Colchester",
x = "Date",
y = "Location Type") +
theme_classic()
print(dot_plt)
In this dot plot of crime incidents by location type to be a compelling visual tool for understanding the distribution and patterns of criminal activities within a given timeframe. The plot effectively conveys the relationships between the date, location type, and the occurrence of crime incidents.
One of the most important findings is that criminal incidents are consistently present across the “Force” location type over the whole date range, with a relatively even distribution of dots along the timeline. This suggests this specific location category has a continuous degree of criminal activity. In contrast, the “BTP” (British Transport Police) location type exhibits a more sporadic pattern, with a smaller number of incidents concentrated in specific date ranges. This may suggest that crimes involving transportation are more concentrated or connected to particular occasions or situations.
This dot plot gives an excellent visualization of the spatial and temporal distribution of the reported crimes, even though it does not specifically display the frequency or severity of the incidents. This information could be valuable in guiding targeted interventions, resource allocation, and the development of more effective crime prevention strategies.
# Density plot
density_plt<-ggplot(crime23_data, aes(x = date)) +
geom_density(fill = "skyblue", color = "black") +
labs(title = "Density Plot of Crime Incidents by Date in Colchester",
x = "Date",
y = "Density") +
theme_minimal()
print(density_plt)
The density of criminal instances over a given time period is shown in the graph with a clear trend. With a peak value of almost 1.5 events per unit of time, the data indicates that the density of criminal incidents is highest around the start of the year. Following this first high, the density of occurrences progressively declines, reaching a low point near the end of the year of about 0.05 incidents per unit of time. The graph’s general bell-curve form suggests that crime incidences follow a normal distribution pattern, with most incidents happening around the start of the year and progressively decreasing over time.
Understanding the local crime trends over time with the help of this data may be helpful, and law enforcement organizations may find it useful in developing plans for preventing crimes or allocating resources.
# Histogram
# Histogram of Latitude
hist_of_lat<-ggplot(crime23_data, aes(x = lat, fill = ..x..)) +
geom_histogram( color = "black", bins = 30) +
labs(title = "Histogram of Latitude", x = "Latitude", y = "Frequency")
hist_plt <- ggplotly(hist_of_lat)
hist_plt
One of the most striking features of the histogram is that we can quickly see the specific latitude that each bar represents when we click on it. Or, as you move your mouse over each bar, watch as the chart tells you the latitude value right where you’re pointing. It’s dynamic, interactive, and full of discoveries—just like having a conversation with the data.
In the histogram, we can see the prominent peak around the latitude value of 51.88885, with a frequency of approximately 900 frequency. It also suggests that a considerable amount of the data is centered in this particular geographic area. There is a rather normal distribution with fewer occurrences at higher and lower latitudes, as indicated by the gradual tapering off on either side of this peak.
The lower, but still significant, peak at latitude 51.90 is another interesting finding.This secondary cluster of data points suggests that there may be multiple regions of interest or focal areas within the overall dataset. Several smaller, more scattered peaks at different latitude values, from around 51.88 to 51.90, are also visible on the histogram. These smaller spikes in frequency could represent specific points of interest or locations that warrant further investigation.
#Need to download this library plotly
#install.packages("plotly")
library(plotly)
median(crime23_data$lat)
[1] 51.889
# Scatter Plot
#scatter plot with a smoothed trendline for latitude and longitude
scatter_smooth_plt <- ggplot(crime23_data, aes(x = long, y = lat)) +
geom_point(alpha=0.5,color = "blue") +
geom_smooth(method = "lm", se = FALSE, color = "red") +
labs(title = "Scatter Plot of Latitude vs Longitude with Trendline", x = "Longitude", y = "Latitude")
plotly_plot <- ggplotly(scatter_smooth_plt)
plotly_plot
The scatter plot, which is based on latitude and longitude data, shows the spatial distribution of Colchester 2023 places. It displays a large range of longitude (0.88 to 0.92) and latitude (51.875 to 51.905) values. The figure is filled with scattered data points, with a dense cluster in the middle and sparser spots near the edges. Latitude and longitude are inversely correlated, as seen by a trendline with a negative slope.This suggests that latitude tends to decrease and vice versa when longitude rises. In addition, this interactive plot provides more functionality, allowing users to click on or move over individual data points to get specific details. Important details, like the exact longitude and latitude coordinates linked to the chosen point, are displayed with every movement or click. so we can understand this plot easily.
#Violin Plot
violin_plt <- ggplot(crime23_data, aes(x = category, y = lat, fill = category)) +
geom_violin() +
labs(title = "Violin Plot of Incident Distribution by Category", x = "Category", y = "Latitude") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
plt <- ggplotly(violin_plt)
plt
The plot is a violin plot, which uses violin-shaped distributions to display the probability density of data across different categories. The latitude numbers are displayed on the y-axis and range roughly from 51.875 to 51.905, indicating that the data comes from a particular location in Colchester 2023.
The violin shapes for categories like “criminal-damage-arson” and “drugs” have their highest densities around latitude 51.9, indicating a higher concentration of incidents related to these crimes in areas represented by the higher latitudes. On the other hand, categories including “theft-from-the-person,” “vehicle-crime,” and “shoplifting” have higher densities around the lower end of the latitude range, or 51.88.
The distribution patterns of various crime types across latitudes are efficiently visualized by this violin plot, which enables us to identify possible hotspots or regions that need special attention for particular categories within the place under investigation. Also,as we move the mouse over the plot, little boxes pop up, showing you details about each group. This helps you understand the data better.
# Correlation Analysis
# Calculate the correlation between latitude and longitude
cor_matrix <- cor(crime23_data[, c("lat", "long")])
print(cor_matrix)
lat long
lat 1.00000000 -0.09025049
long -0.09025049 1.00000000
ggpairs(crime23_data[, c("lat", "long", "date")], title_size = 19)
The analysis reveals distinct spatial and temporal patterns within the dataset. Dates are uniformly distributed, latitude is concentrated around 100, and longitude exhibits dual trends. The coordinates (51.89, 0.89) are the core of a dense cluster of data points. Data subset variations are represented by parallel coordinate plots, and uniform and variable distributions are shown by bar charts. A minimal relationship is suggested by the slight negative correlation (-0.090) between latitude and longitude. A uniform distribution within the interquartile range is also indicated by the box plot’s lack of outliers.
# Time Series Plot
#First need to download this "zoo" library for "yearmon" function
#install.packages("zoo")
library(zoo)
# Extract year and month from the date column
crime23_data$year_month <- as.yearmon(crime23_data$date)
# Count number of crimes per year-month
crime23_counts <- crime23_data %>%
group_by(year_month) %>%
summarise(crime23_count = n())
# Plotting number of crimes per year-month
ggplot(crime23_counts, aes(x = year_month, y = crime23_count)) +
geom_line(color = "blue") +
geom_smooth(method = "loess", color = "red") +
labs(title = "Number of Crimes per Year-Month", x = "Year-Month", y = "Crime Count") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
The red line, representing a specific type of crime, it starts at around 590 specific type of crime in January 2023, then drops sharply to around 550 in March 2023, before rising again to around 580 in May 2023.
The red line then experiences many fluctuations, and the line touches 590 incidents, the same as in January 2023 and in September 2023, reaching its lowest point of around 540 crimes after October 2023. This suggests that the specific type of crime represented by the red line experienced a notable decrease during this period.
The blue line shows a more volatile pattern compared to the red line. Looking at the data, we can observe that the crime rate starts relatively high in January 2023, reaching around 650 incidents. However, it then decreases gradually, hitting a low point of around 430 crimes in February 2023. The crime rate then rises again, peaking in May 2023 at approximately 551 incidents. After this peak, there were many ups and downs in every month, then the number of crimes touched around 640 incidents in September 2023, and again, the number of crimes declined once more, reaching around 580 in October 2023. This downward trend indicates that the crime rate was decreasing towards the latter part of the year.
The variations in the blue and red lines suggest that the frequency of criminal activity in the area is influenced by other or seasonal causes. By recognizing these trends, authorities can better target their resources and create methods for preventing crime.
The “zoo” library provides features for effectively managing time-series data.Data for each month is represented by the “yearmon” class.
#need to download this Libraries
#install.packages("ggmap")
library(ggmap)
#install.packages("leaflet")
library(leaflet)
# Map Visualization
map <- leaflet() %>%
addTiles() %>%
addCircleMarkers(
data = crime23_data,
lng = ~long,
lat = ~lat,
radius = 3,
color = "red",
popup = paste("Category: ", crime23_data$category, "<br>",
"Date: ", crime23_data$date, "<br>",
"Outcome: ", crime23_data$outcome, "<br>",
"Location: ", crime23_data$location)
)
# Display the map
map
The map visualization displays incident locations using a mapping tool. Each marker on the map corresponds to a specific incident location, showing its geographical position. When clicked, the markers provide information about the incident category. This interactive map offers a visual representation of incident distribution, enabling easy identification of geographic clusters or patterns within the dataset.
In short, we study of the Crime23 data has helped us learn more about when and where crimes happen in a specific area. By analyzing the data, we’ve discovered some trends that could help us figure out ways to stop crime. Moving forward, it’s important to use This information is critical for police, leaders, and communities because it allows them to better plan crime prevention and make better use of their resources to keep people safe.